AITopics | neural network language model

Collaborating Authors

neural network language model

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

SVD-Softmax: Fast Softmax Approximation on Large Vocabulary Neural Networks

Kyuhong Shim, Minjae Lee, Iksoo Choi, Yoonho Boo, Wonyong Sung

Neural Information Processing SystemsNov-21-2025, 08:38:26 GMT

We propose a fast approximation method of a softmax function with a very large vocabulary using singular value decomposition (SVD).

artificial intelligence, machine learning, svd-softmax, (16 more...)

Neural Information Processing Systems

Country:

Asia > South Korea > Seoul > Seoul (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

SVD-Softmax: Fast Softmax Approximation on Large Vocabulary Neural Networks

Kyuhong Shim, Minjae Lee, Iksoo Choi, Yoonho Boo, Wonyong Sung

Neural Information Processing SystemsOct-3-2024, 09:32:45 GMT

We propose a fast approximation method of a softmax function with a very large vocabulary using singular value decomposition (SVD).

language model, machine translation, svd-softmax, (13 more...)

Neural Information Processing Systems

Country:

Asia > South Korea > Seoul > Seoul (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.31)

Add feedback

Testing learning hypotheses using neural networks by manipulating learning data

Leong, Cara Su-Yi, Linzen, Tal

arXiv.org Artificial IntelligenceJul-5-2024

Although passivization is productive in English, it is not completely general -- some exceptions exist (e.g. *One hour was lasted by the meeting). How do English speakers learn these exceptions to an otherwise general pattern? Using neural network language models as theories of acquisition, we explore the sources of indirect evidence that a learner can leverage to learn whether a verb can passivize. We first characterize English speakers' judgments of exceptions to the passive, confirming that speakers find some verbs more passivizable than others. We then show that a neural network language model can learn restrictions to the passive that are similar to those displayed by humans, suggesting that evidence for these exceptions is available in the linguistic input. We test the causal role of two hypotheses for how the language model learns these restrictions by training models on modified training corpora, which we create by altering the existing training corpora to remove features of the input implicated by each hypothesis. We find that while the frequency with which a verb appears in the passive significantly affects its passivizability, the semantics of the verb does not. This study highlight the utility of altering a language model's training data for answering questions where complete control over a learner's input is vital.

judgment, passive drop, verb, (15 more...)

arXiv.org Artificial Intelligence

2407.04593

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > United States > New York (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
(11 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Education (0.68)
Media (0.46)
Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Tuor

AAAI ConferencesFeb-8-2022, 11:09:12 GMT

Automated analysis methods are crucial aids for monitoring and defending a network to protect the sensitive or confidential data it hosts. This work introduces a flexible, powerful, and unsupervised approach to detecting anomalous behavior in computer and network logs; one that largely eliminates domain-dependent feature engineering employed by existing methods. By treating system logs as threads of interleaved sentences'' (event log lines) to train online unsupervised neural network language models, our approach provides an adaptive model of normal network behavior. We compare the effectiveness of both standard and bidirectional recurrent neural network language models at detecting malicious activity within network log data. Extending these models, we introduce a tiered recurrent architecture, which provides context by modeling sequences of users' actions over time. Compared to Isolation Forest and Principal Components Analysis, two popular anomaly detection algorithms, we observe superior performance on the Los Alamos National Laboratory Cyber Security dataset. For log-line-level red team detection, our best performing character-based model provides test set area under the receiver operator characteristic curve of 0.98, demonstrating the strong fine-grained anomaly detection performance of this approach on open vocabulary logging sources.

neural network language model, tuor

AAAI Conferences

Country: North America > United States > New Mexico > Los Alamos County > Los Alamos (0.29)

Industry:

Information Technology (0.89)
Energy (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Data Science > Data Mining > Anomaly Detection (0.89)

Add feedback

What Does It Mean for AI to Understand?

#artificialintelligenceDec-17-2021, 05:40:08 GMT

Remember IBM's Watson, the AI Jeopardy! A 2010 promotion proclaimed, "Watson understands natural language with all its ambiguity and complexity." However, as we saw when Watson subsequently failed spectacularly in its quest to "revolutionize medicine with artificial intelligence," a veneer of linguistic facility is not the same as actually comprehending human language. Natural language understanding has long been a major goal of AI research. At first, researchers tried to manually program everything a machine would need to make sense of news stories, fiction or anything else humans might write.

language model, neural network language model, winograd schema, (12 more...)

#artificialintelligence

Industry:

Information Technology (0.50)
Leisure & Entertainment (0.30)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.37)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.32)

Add feedback

What changes OpenAI's GPT-3 and other models brought to us

#artificialintelligenceApr-9-2021, 08:04:04 GMT

In June last year, GPT-3 released by OpenAI, it is composed of 175 billion parameters, and training cost tens of millions of dollars, it was the largest artificial intelligence language model ever produced. From answering the questions to writing articles and poems, and even writing slang language everything is covered. The full name of GPT-3 is Generative Pretrained Transformer-3 (Generative Pretrained Transformer-3). This is the third series of generating pretraining converters, which is more than 100 times that of GPT-2 in 2019. In GPT-3 there are 175 billion parameters, the second largest language model has 17 billion parameters.

generative pretrained transformer-3, gpt-3, language model, (13 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.68)

Add feedback

LSTM Language Models for LVCSR in First-Pass Decoding and Lattice-Rescoring

Beck, Eugen, Zhou, Wei, Schlüter, Ralf, Ney, Hermann

arXiv.org Machine LearningJul-1-2019

LSTM based language models are an important part of modern LVCSR systems as they significantly improve performance over traditional backoff language models. Incorporating them efficiently into decoding has been notoriously difficult. In this paper we present an approach based on a combination of one-pass decoding and lattice rescoring. We perform decoding with the LSTM-LM in the first pass but recombine hypothesis that share the last two words, afterwards we rescore the resulting lattice. We run our systems on GPGPU equipped machines and are able to produce competitive results on the Hub5'00 and Librispeech evaluation corpora with a runtime better than real-time. In addition we shortly investigate the possibility to carry out the full sum over all state-sequences belonging to a given word-hypothesis during decoding without recombination.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Machine Learning

1907.0103

Country:

Europe > Germany > North Rhine-Westphalia > Cologne Region > Aachen (0.05)
Asia > Singapore (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
(8 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

IBM Sets New Transcription Performance Milestone on Automatic Broadcast News Captioning

#artificialintelligenceJun-2-2019, 19:50:16 GMT

Two years ago IBM set new performance records on conversational telephone speech (CTS) transcription, by benchmarking its deep neural network based speech recognition systems on the Switchboard and Callhome corpora, two popular publicly available data sets for automatic speech recognition [1]. Here we show that this impressive performance holds on other audio genres. Similar to the CTS benchmarks, the industry has for many years evaluated system performances on multimedia audio signals with broadcast news (BN) captioning. We have now achieved a new industry record of 6.5% and 5.9% on two BN benchmarks: RT04 and DEV04F [2]. Both these test sets have been released in the past by the Linguistic Data Consortium (LDC) [3].

artificial intelligence, machine learning, set new transcription performance milestone, (10 more...)

#artificialintelligence

Industry:

Media > News (0.66)
Information Technology (0.63)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.62)

Add feedback

Lattice Rescoring Strategies for Long Short Term Memory Language Models in Speech Recognition

Kumar, Shankar, Nirschl, Michael, Holtmann-Rice, Daniel, Liao, Hank, Suresh, Ananda Theertha, Yu, Felix

arXiv.org Machine LearningNov-15-2017

ABSTRACT Recurrent neural network (RNN) language models (LMs) and Long Short Term Memory (LSTM) LMs, a variant of RNN LMs, have been shown to outperform traditional N-gram LMs on speech recognition tasks. However, these models are computationally more expensive than N-gram LMs for decoding, and thus, challenging to integrate into speech recognizers. Recent research has proposed the use of lattice-rescoring algorithms using RNNLMs and LSTMLMs as an efficient strategy to integrate these models into a speech recognition system. In this paper, we evaluate existing lattice rescoring algorithms along with new variants on a Y ouTube speech recognition task. Lattice rescoring using LSTMLMs reduces the word error rate (WER) for this task by 8% relative to the WER obtained using an N-gram LM. Index Terms-- LSTM, language modeling, lattice rescoring, speech recognition 1. INTRODUCTION A language model (LM) is a crucial component of a statistical speech recognition system [1]. While this makes the N-gram LMs powerful for tasks such as voice-search where short-range contexts suffice, they do not perform as well at tasks such as transcription of long form speech content, that require modeling of long-range contexts [2].

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Machine Learning

1711.05448

Country: North America > United States (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Sentences with style and topic

#artificialintelligenceSep-29-2016, 11:30:06 GMT

In this week's post we will have a closer look at a paper dealing with the modeling of style, topic and high-level syntactic structures in language models by introducing global distributed latent representations. In particular, the variational autoencoder seems to be a promising candidate for pushing generative language models forwards and including global features. Recurrent neural network language models are known to be capable of modeling complex distributions over sequences. However, their architecture limits them to modeling local statistics over sequences and therefore global features have to be captured otherwise. Non-generative language models include the standard recurrent neural network language model, which predicts words depending on previous seen words and does not learn a global vector representation of the sequence at any time.

artificial intelligence, machine learning, natural language, (17 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback